[storage] Atomic alter table in the storage controller #31408

bkirwi · 2025-02-07T21:22:45Z

This avoids a possible race in the storage controller, where a downgrade of our read capability can race the selection of new capabilities for a new version of the table. (More detail in a comment downthread.)

We instead switch to:

using the frontiers of the old version directly;
doing the frontier selection and the state update atomically while holding the lock;

...which together should avoid this particular race.

Motivation

https://github.com/MaterializeInc/database-issues/issues/8952

Tips for reviewer

This bug is quite tricky to reproduce, so I'm not 100% confident that this is the issue we're seeing in CI. Happy to take suggestions for improving the tests! But also please check my logic below: that it sounds like an accurate description of the current behaviour, and that the particular interleaving of events it imagines would explain the symptoms we see.

Checklist

This PR has adequate test coverage / QA involvement has been duly considered. (trigger-ci for additional test/nightly runs)
This PR has an associated up-to-date design doc, is a design doc (template), or is sufficiently small to not require a design.
If this PR evolves an existing $T ⇔ Proto$T mapping (possibly in a backwards-incompatible way), then it is tagged with a T-proto label.
If this PR will require changes to cloud orchestration or tests, there is a companion cloud PR to account for those changes that is tagged with the release-blocker label (example).
If this PR includes major user-facing behavior changes, I have pinged the relevant PM to schedule a changelog post.

bkirwi · 2025-02-14T00:21:49Z

A hypothesis

At the storage collections level, alter table is roughly:

Grab a snapshot of the frontiers of the previous version, as known to the storage controller.
Evolve the schema of the shard.
Obtain new handles for the shard. Notably, these frontiers are not advanced: they’re based on whatever’s currently durable in Persist.
Generate new metadata for the collection. The initial since hold of this collection is:
1. the current critical since of the shard. (Read from Persist.)
2. the overall frontier of the read capabilities obtained in step 1.
Atomically modify the controller state: add the new version and point the old version to it. Among other things, this means that only changes to the new version’s frontiers get sent to the background worker and synced back to Persist.

Observations:

The frontier we get in 3 will generally be behind what we got in 1: since we write sinces back to Persist asynchronously. Even if we ignore the "maybe" in maybe_downgrade_since, there's an interval between when the frontier changes in our in memory state and when it gets dequeued by the background worker and written to consensus state.
However, it’s also possible for the frontier we get in 3 to be ahead, since this whole setup is not atomic, and we might have downgraded the critical handle in the meantime.

So: suppose we ALTER TABLE right as we downgrade the since of our table from t=10 to t=20, enqueueing but not applying a downgrade. In that case, we might see t=20 in step 1 but t=10 in step 3. Then, as we go to write the new metadata (with an overall since of t=10) into our storage collections state. At this point, the background worker might finally apply the downgrade to t=20, making t=10 permanently unreadable even though the storage controller still thinks it can issue handles at that frontier for the new version of the table.

aljoscha

Yeah, I think this can happen! At first I was skeptical because I thought re-acquiring the since handle would protect us, because I mistakenly thought that someone somewhere would have to fail because of a mismatched since when trying to downgrade. But, we only compare against the epoch: when downgrading a critical since handle we don't check that the actual since frontier is what we expect. So in the alter table situation it can (does) happen that there are two valid critical since handles in the process that can downgrade the since (the one we freshly acquired, and the one in the background worker), and they both map to the one logical since handle (the one since handle ID).

Very nice find! 🙌

bkirwi · 2025-02-17T20:48:31Z

Great! So far the only nightly errors are either also present on main, or seem like flakes - since they don't alter any tables, so this code wouldn't run - but I'll try and confirm that before merging.

bkirwi force-pushed the alter-table branch 2 times, most recently from a4ee72f to 016a188 Compare February 14, 2025 00:10

bkirwi force-pushed the alter-table branch from 016a188 to 4ec4d3a Compare February 14, 2025 16:21

bkirwi changed the title ~~[wip] Debug alter table~~ [storage] Atomic alter table in the storage controller Feb 14, 2025

Use the freshest frontiers when altering table

dd87b1a

bkirwi force-pushed the alter-table branch from 4ec4d3a to dd87b1a Compare February 14, 2025 18:22

bkirwi marked this pull request as ready for review February 14, 2025 18:25

bkirwi requested a review from a team as a code owner February 14, 2025 18:25

bkirwi requested a review from aljoscha February 14, 2025 21:12

aljoscha approved these changes Feb 17, 2025

View reviewed changes

bkirwi merged commit 2519925 into MaterializeInc:main Feb 18, 2025
170 of 172 checks passed

bkirwi deleted the alter-table branch February 18, 2025 15:46

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[storage] Atomic alter table in the storage controller #31408

[storage] Atomic alter table in the storage controller #31408

Uh oh!

bkirwi commented Feb 7, 2025 •

edited

Loading

Uh oh!

bkirwi commented Feb 14, 2025 •

edited

Loading

Uh oh!

aljoscha left a comment

Uh oh!

bkirwi commented Feb 17, 2025

Uh oh!

Uh oh!

Uh oh!

[storage] Atomic alter table in the storage controller #31408

[storage] Atomic alter table in the storage controller #31408

Uh oh!

Conversation

bkirwi commented Feb 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Tips for reviewer

Checklist

Uh oh!

bkirwi commented Feb 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

A hypothesis

Uh oh!

aljoscha left a comment

Choose a reason for hiding this comment

Uh oh!

bkirwi commented Feb 17, 2025

Uh oh!

Uh oh!

Uh oh!

bkirwi commented Feb 7, 2025 •

edited

Loading

bkirwi commented Feb 14, 2025 •

edited

Loading